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ABSTRACT 


Post-stroke patients need ongoing rehabilitation to restore dysfunction caused 
by an attack so that a monitoring device is required. EEG signals reflect 
electrical activity in the brain, which also informs the condition of post-stroke 
patient recovery. However, the EEG signal processing model needs to 
provide information on the post-stroke state. The development of deep 


learning allows it to be applied to the identification of post-stroke patients. 
This study proposed a method for identifying post-stroke patients using 
convolutional neural networks (CNN). Wavelet is used for EEG signal 
information extraction as a feature of machine learning, which reflects 
the condition of post-stroke patients. This feature is Delta, Alpha, Beta, 
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EEG signal Theta, and Mu waves. Moreover, the five waves, amplitude features are also 
Extraction added according to the characteristics of the post-stroke EEG signal. 
Post-stroke The results showed that the feature configuration is essential as distinguish. 
Wavelet The accuracy of the testing data was 90% with amplitude and Beta features 
compared to 70% without amplitude or Beta. The experimental results also 
showed that adaptive moment estimation (Adam) optimization model was 
more stable compared to Stochastic gradient descent (SGD). But SGD can 

provide higher accuracy than the Adam model. 
This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 

Stroke more often leaves disability than death. Stroke is the third-largest cause of disability in 
the world [1]. It is the number one cause of death in Indonesia based on the results of research published by 
the Health Research and Development Agency (Balitbangkes) in 2015. Therefore, medical rehabilitation 
is essential to restore post-stroke patients in independence to take care of them and carry out daily life 
activities. In stroke rehabilitation, there is a standard clinical procedure, namely the National Institute of 
Health Stroke Scale (NIHSS), to determine the current condition of patients [2]. The NIHSS tool can be an 
evaluation of the rehabilitation of stroke patients. NIHSS uses 11 indicators for post-stroke patients, namely 
the level of consciousness, eye gaze, vision, facial paralysis, hand and foot movements, joint inflammation, 
sensory abilities, language disorders, speech disorders, and sensory abilities. In each indicator, there is an 
assessment of the score between 0-4. But some variable use range 0-2. Scoring informs five levels of stroke, 
namely, score O no indication of a stroke at all (No Stroke), score 1-4 mild stroke (Minor Stroke), 
score 5-15 medium stroke (Moderate Stroke), score 16-20 medium stroke towards severe stroke, and 21-42 
severe strokes [3]. 
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Stroke can be detected by several instruments, such as CT scan, MRI, electroencephalogram (EEG), 
and EMG. CT scan and MRI are usually for the first treatment detection. The computational capabilities, 
automatically neurorehabilitation, can use EMG [4] and EEG [5]. EEG records reflect brain function. 
The instrument can provide supporting information in identifying and monitoring different neurological 
rehabilitation [5]. EEG has advantages in low cost and minimal risk to patients, but it is necessary to 
process to inform the brain's condition properly. Some studies used EEG signals to identify ischemic stroke 
patients [6], investigate that stroke patients are able to use BCI [7], and extracted significant variable [8]. 

EEG signal processing consists of two essential factors that are feature extraction and identification [9]. 
Usually, the neurologist reads the EEG signal of a post-stroke patient by observing wave density or rhythm, 
amplitude, asymmetric, change in magnitude, the presence of waves, and the ratio between waves. The most 
important thing is how to extract the signal into frequency components. The next step is to choose waves 
with the right frequency component under the characteristics of the EEG signal variable being reviewed [10]. 
Therefore, research using frequency extraction is beneficial. Several previous studies used Wavelet extraction 
to determine the significant variables of EEG signals for post-stroke patients [8]. Wavelet can extract Alpha, 
Beta, Theta, Gamma, and Mu waves to classify emotions of stroke patients [11] and healthy 
people [12]. Also, wavelet provides easy time-frequency analysis. Past research shows that the study of 
the time-frequency of stroke patients during TMS therapy [13]. Many other studies have also looked at 
the characteristics of waves with a specific frequency in stroke patients [14], which is one characteristic. 
Previous studies used Wavelet extraction of EEG signals to find significant variables of stroke [8] and 
the classification of emotions [15]. In meanwhile, FFT is suitable for identifying EEG signal asymmetries [16]. 

Machine learning allows computers to learn like humans make computers learn EEG stroke patterns 
previously so that automatic detection. Deep learning is part of machine learning, which in recent years has 
shown remarkable performance. Popular methods used in deep learning are convolutional neural network 
(CNN) and recurrent neural networks (RNN). Both have enough accuracy and are subject to composition. 
However, RNN is vulnerable to feature sequences, which can result in decreased correctness. Previous 
studies using a network similar to CNN, namely DBN, for the classification of slow cortical potentials in 
stroke [14]. CNN is widely used for image processing. An EEG signal can be viewed as an image, so its 
processing uses two-dimensional CNN, like previous studies for the detection of epileptic attacks [17]. 
However, not a few are also used for signal processing using one dimension. Some researchers used to 
identify EEG signals between stroke patients and healthy people [18], ischemic detection [19]. Another study 
to analyze epileptiform spikes on EEG signals [17] and motor imagery (MI) detection [20]. After undergoing 
rehabilitation, the post-stroke patient is expected to be healthy or have no stroke. NIHSS scales can be 
the standard for determining stroke levels from several indicators. But for the “minor” and “no stroke” levels, 
there is a slight difference. Therefore, we need another instrument, such as the EEG. This study proposed 
the CNN method for classifying EEG signals against post-stroke patients into both classes. The computational 
model of the EEG signal is built through wavelet extraction to get the right frequency range. EEG signals 
were extracted classified using CNN. 


2. RESEARCH METHOD 
The EEG signal classification of post-stroke patients used CNN, as shown in Figure 1, to identify 
"Minor Stroke," and "No Stroke" class. 
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Figure 1. Identification of EEG signal of the post-stroke patient model 
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2.1. Data acquisition 

EEG signal data is recorded using Emotiv Epoch+ with a sampling frequency of 128 Hz from 14 
channels, i.e., AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8. As subjects were 50 outpatients in 
the neurology clinic of RSIB hospital [8]. As a control group of the "No Stroke" class, recorded EEG signals 
from 25 healthy volunteers. Control groups are also used to identify strokes using portable EEG [16] 
and identified ischemic stroke [19]. At the rate 80% of all subjects as training data and 20% as non-training 
data. Each patient follows the instruction in Figure 2 to generate the appropriate waves. From 180 seconds of 
recording, the data used is only 120 seconds by removing 30 seconds beginning and ending. 


: MI right | MI left 
Instruction Open eyes aud hand Close eyes 
Time Processing 
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Figure 2. Instruction of data acquisition 


2.2. Wavelet extraction 

Wavelet transformation aims to split signals in two frequency ranges, approximation and detail or as 
frequency filter. The method is appropriate for non-stationary signals such as EEG [21]. Basis function ®(n) 
called the mother wavelet as (1); 


Dn) =2 O2 n-k) (1) 


where j and k are an integer that indicates the scaling and dilate of the basis function. It depends on the shape 
or position of the signal. ®(n) is the wavelet family. 

Signals with a 128 Hz sampling frequency may have a frequency component of 1-64 Hz or half. 
Each decomposition will be the half frequency signal as an approximation and a detail component. If it is 
repeated several steps, it can be extracted from Alpha or Mu waves, Theta, Delta wave, as in Figure 3. 
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Figure 3. Wavelet decomposition 


Discrete wavelet transformation passes a high-pass filter (detail), and a low-pass filter 
(approximation) signals shown in (2) and (3). 


Approximation = Vow (K) = Xn x(n).g(n— k) (2) 
Detail = Ynign (K) = Linx(n). g(n — k) (3) 


Where j and k are an integer that indicates the scaling and dilate of the basis function. It depends on the shape 
or position of the signal. O(n) is the wavelet family. The feature extraction of the EEG signal is obtained by 
convolution, which also decomposes the signal to several levels. A previous study used Symlet2 to eliminate 
noise and get the desired signal that is Gamma, Beta, Alpha, Theta, and Mu wave [8]. EEG signal wave 
extraction can use fast Fourier transform (FFT) or wavelet transformation. Wavelet capability is better for 
non-stationary signals such as EEG signals [22]. Wavelet extraction helps distinguish EEG signals, so that 
provide good accuracy [21]. 
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2.3. Convolution neural network 

CNN has the potential to process data in the time domain, such as an EKG signal [17]. In pattern 
recognition, what is very important is the extraction of EEG signals into features that match the classification 
variable. Some studies related to stroke using Alpha, Beta, Theta, and Delta waves [6, 20], other studies 
using brain symmetric index and relative frequency band ratio [23], frequency bands [24] and their 
comparisons [25], amplitude, rhythm [24] and MI [26]. This research used Delta, Theta, Alpha, Beta waves, 
MI, and amplitude based on a previous study [8]. Refer to Figure 2; the first 30 seconds during eye-opening 
contain Delta, Theta, Alpha, and Beta waves (1-32 Hz). As Figure 3, there are 64 points every second. 
Meanwhile, when the MI relation to Mu wave, that is 720 points. While the last 30 seconds raises the Alpha 
wave, which is 720 seconds. By adding the amplitude feature, the configuration as in Table 1. 

Amplitude takes average each 0.0625-second segment or 8 points per second. In Table 1, all features 
are on each channel, except for the Mu wave feature. So specifically, FCS and FC6 channels have 3.960 
points, while another channel has 3240 points. So there are 46.800 features. All features will normalize to 
reduce bias and get better learning [27]. In this study, normalization was carried out in a range of numbers 
between -1 and | to equalize the maximum and minimum values of each channel. 


Table 1. CNN feature 


No Time (second) Component Number of points Description 

1 30 Filter (1-32 Hz) 1.920 All channel 

2 30 Mu (8-13 Hz) 360 FCS5 and FC6 only 
3 30 Mu (8-13 Hz) 360 FCS5 and FC6 only 
4 30 Alpha (8-13 Hz) 360 All channel 

5 120 Average Amplitude 960 All channel 

Total 3.960 


CNN consists of a convolution layer for extracting features and classification layers for learning, as 
in Figure 4. Usually, the classification or identification layer uses the multi-layer perceptron (MLP) 
architecture. In previous studies, CNN was used to identify ischemic stroke patients and no stroke people 
throughout looking at EEG signals into one-dimensional objects [17, 18]. While in other studies, CNN was 
used to identify MI [28], and ECG [29]. The feature generated by this layer will be input into the CNN 
identification layer through training first. In learning, there are several ways to correct weights, such as 
stochastic gradient descent (SGD) or adaptive moment estimation (Adam). AdaDelta has a way of working 
similar to adaptive backpropagation, which has been used for BCI [30]. 
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Figure 4. Convolutional neural networks architecture 


2.3.1. Feature extraction layer 

The layers contained in feature extraction are useful for translating input into a feature based on 
the data in the form of a vector. This feature extraction layer consists of a convolution layer and a sub 
sampling (pooling) layer. The hyper parameters of the convolution are depth, stride, and zero-padding, as 
shown in (4). The output of this layer is used to the next process, namely the subsampling process of each m, 
map of the matrix with the size of m x n and kernel function C calculated in rows k and column y in the first 
convolution layer. 
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(Lm) = Kn-1 Kw-1 (Lm) (l—1) 
FM Gin =] oS hicr Cren TM erthea)) (4) 
F(x) = max(0,x) (5) 


Where K, is height, and K, is the length of the kernel. Then r; and çc; are the length and height of the kernel 
layer index l, and C is the convolution kernel. After the convolution process, then the next is the activation 
process using the rectified linear units (ReLU), which is useful for normalizing all negative values to 
zero using (5). 

Next, the downsampling or pooling layer serves to reduce the number of parameters from 
the feature map that will be used in the classification layer. Also, pooling speeds up computing and controls 
overfitting. Downsampling is accomplished by getting the value of a feature map with a determined size 
and stride. This study used Max pooling, which only regards the highest value in the feature map. 


2.3.2. Identification layer 

The second layer of CNN is an identification layer composed of several layers. Each layer is formed 
of fully connected neurons with other layers, known as multi-layer perceptron. Meanwhile, the learning 
process used some methods such as Backpropagation [28]. This layer handles the dropout process to 
overcome overfitting in training, which can cause a reduction [31]. Dropout works by temporarily removing 
a neuron in the form of hidden layers, input layers, or output layers that are in the network. The activation 
function, usually using Softmax that convert the output into a probability for each class, as shown in (6). 


Ta a (6) 


A loss function is used to measure the error between expectations of the target class compared to 
the actual data from the computational results. The loss function that can be used is Cross-Entropy (7). 
Where Loss is the distance, S is the result of the Softmax S, and L is the class label. 


Loss(S,L)=—> L,log(S;) (7) 


i 


3. RESULTS AND DISCUSSION 

EEG signals of post-stroke patients can be recognized by observing wave density or rhythm, 
amplitude and also differences in the magnitude of channel pair, and the presence of Alpha, Beta, Delta, and 
Theta waves. Therefore, learning features used Alpha, Beta, Delta, Teta wave, and amplitude. The key to the 
successful classification of EEG signals is extraction into frequency components. Therefore, Wavelet 
performance needs to be tested first. 


3.1. Wavelet extraction 

Wavelet extraction using Symlet2 at 0.5-32 Hz is illustrated in Figure 5. The blue line 1s the original 
signal, and the orange line is the result of a signal that has performed wavelet extraction. It can be seen that a 
low wave signal is obtained by decomposing to eliminate noise in the EEG signal via wavelet extraction. 
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Figure 5. Comparison between original signals with wavelet extraction 
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3.2. Compared between optimization model 

The classification that is performed using deep learning with the CNN model needs configuration to 
obtain high accuracy. In this study, using VGG19 architecture as a learning process. The architecture has 16 
convolution layers and five pooling layers. A suitable configuration is required to update the weights during 
the training so that they get maximum results. The weight correction can be achieved with various 
optimizers. Some of them are adaptive moment estimation (Adam) and stochastic gradient descent (SGD). 
This research optimizes the learning rate of 0.01 and 500 epochs. Accuracy and Loss value each optimizer 
are shown in Figure 6, Figure 7, and Table 2. The Adam model (orange color) was more stable than the SGD 
model (red color) for new data (valid) in the small epoch. But SGD has higher accuracy compared to Adam. 
Because SGD still changes the value of accuracy to find the best value compared to Adam [32]. 
The convergence of the Adam model to SGD is seen in Loss in Figure 7, so both values are consistent. 
Table 2 shows the optimization of SGD is better than Adam, although the loss value is higher. But it 
is not significant. 
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Figure 7. Loss of Adam and SGD model 


Besides, to accuracy, testing is performed on computational time in learning. This study carried out 
learning in 500 epochs with GPU Nvidia Geforce GTX 1050 4GB. A comparison of computing time between 
SGD and Adam models is shown in Table 2. The absence of Beta or amplitude waves, even though they 
provide low accuracy, are obtained with a shorter time. Based on Table 1, the number of amplitude points 
and Beta waves is the same, which gives the corresponding computational time reduction as Table 2. 


Table 2. Compare Adam and SGD model 


Configuration Model Training data New data 
Amplitude Beta Accuracy(%) Loss Accuracy (%) Loss Computation time (sec) 
v v Adam 100 0.0000 80.00 3.7571 180 
SGD 100 0.0007 90.00 0.8221 160 
x v Adam 100 0.0000 60.00 6.4120 140 
SGD 100 0.0009 70.00 1.9362 120 
v x Adam 100 0.0000 50.00 5.3427 140 
SGD 100 0.0007 70.00 1.0861 120 


3.3. Performance of amplitude and Beta wave 

The experiment showed that involve amplitude as a learning feature gave more accurate for both 
models, as shown in Table 2. From Figure 8, it can be seen that the accuracy of SGD is also higher than 
Adam. And Adam model is more stable too than SGD for configurations without amplitude. CNN works 
based on the convolution of time series data with the kernel without affording a direct connection between 
sequential data. Therefore, the use of wave features and amplitude still provides higher accuracy. 
This experiment is consistent with previous research [8]. 
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Figure 8. Accuracy of Adam and SGD without amplitude 


Although the presence of low-frequency wave dominance characterizes strokes, Beta waves also 
contribute to MI in addition to Mu waves [26]. Therefore its appearance is also significant for increasing 
accuracy. The result showed that the accuracy of the SGD model is higher too than the Adam, without 
the presence of Beta waves model, as shown in Figure 9. 
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Figure 9. Accuracy of Adam and SGD without Beta wave 


4. CONCLUSION 

This study identified two levels of stroke from NIHSS using wavelet and CNN with training data 
accuracy of 100% and non-training data at 90%. The essential thing in identification is the proper 
configuration of input features according to the identified variable, which is stroke. Using low waves (Delta 
and Theta), Alpha, motor imagery (Beta and Mu), and amplitude are significant in the classification of 
“minor strokes” to “no strokes”, thus providing good accuracy. Identification using CNN also needs to pay 
attention to optimization models that provide high correctness and stability. Sometimes, both of them are 
contradictive, so accuracy considerations are preferred. The SGD model is more oscillating at the beginning 
of the epoch but provided better accuracy than the Adam model. 

In the future, we need to find the configuration of the channel. Appropriate channel processing can 

increase accuracy and reduce computing time that is more realistic to implement. Also worth noting 
is the setting and sequence of features that provide high accuracy and relatively low computing time. 
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